Bayesian Modelling Of Vowel Segment Duration For Text-to-Speech Synthesis Using Distinctive Features

نویسنده

  • Olga V. Goubanova
چکیده

We apply a Bayesian belief network (BN) approach to vowel duration modelling, whereby vowel segment duration is modelled as a hybrid Bayesian network consisting of discrete and continuous nodes, with the nodes in the network representing linguistic factors that affect segment duration. Factor interaction is modelled in a concise way by causal relationships among the nodes in a directed acyclic (DAG) graph. New to the present research, we model segment identity as a set of distinctive features. The features chosen were frontness, height, length, and roundness. In addition, the BNs were augmented with the word class feature (content vs. function). We experimented with different BNs, and contrasted the results of the belief network model with those of Sums-of-Products (SoP) and classification and regression trees (CART) models. We trained and tested all three models on the same data. In terms of the RMS error and correlation coefficient, our BN model performs better than CART and SoP model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using bayesian belief networks for model duration in text-to-speech systems

The problems of database imbalance and factor interaction make modelling of segment duration in text-to-speech systems a challenging task. We therefore propose a probabilistic Bayesian belief network (BN) approach to tackle data sparsity and factor interaction problems. The belief network approach makes good estimations in cases of missed or incomplete data. Also, it captures factor interaction...

متن کامل

The Relationship Between Acoustic Characteristics and Personality Dimensions in Patients With Dysphonia

Objectives: Voice is influenced by personality. However, it is still questionable which acoustic features are influenced by personality traits. This study aimed to investigate the relationship between acoustic characteristics and personality dimensions. Methods: Thirty-three participants with dysphonia and 33 participants without dysphonia were recruited to take part in this cross-sectional st...

متن کامل

بررسی اثر فیدبک شنوائی در تولید گفتار بعد از عمل کوکلئار ایمپلنت

The main goal of this study is to determine the auditory feedback effects in improvement of speech production process in prelingual totally deaf children who used cochlear implant prosthesis. For this reason, we recorded speech of four prelingual cochlear implant children pre and post of operation. Then we extract some static features of vowels-such as fundamental frequency, formant frequencies...

متن کامل

Modeling vowel duration for Japanese text-to-speech synthesis

Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of bot...

متن کامل

Prosody modelling in Czech text-to-speech synthesis

This paper describes data-driven modelling of all three basic prosodic features – fundamental frequency, intensity and segmental duration – in the Czech text-to-speech system ARTIC. The fundamental frequency is generated by a model based on concatenation of automatically acquired intonational patterns. Intensity of synthesised speech is modelled by experimentally created rules which are in conf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006